41 research outputs found

    On structural properties of the value function for an unbounded jump Markov process with an application to a processor sharing retrial queue

    Get PDF
    The derivation of structural properties for unbounded jump Markov processes cannot be done using standard mathematical tools, since the analysis is hindered due to the fact that the system is not uniformizable. We present a promising technique, a smoothed rate truncation method, to overcome the limitations of standard techniques and allow for the derivation of structural properties. We introduce this technique by application to a processor sharing queue with impatient customers that can retry if they renege. We are interested in structural properties of the value function of the system as a function of the arrival rate

    Parameter-Independent Strategies for pMDPs via POMDPs

    Full text link
    Markov Decision Processes (MDPs) are a popular class of models suitable for solving control decision problems in probabilistic reactive systems. We consider parametric MDPs (pMDPs) that include parameters in some of the transition probabilities to account for stochastic uncertainties of the environment such as noise or input disturbances. We study pMDPs with reachability objectives where the parameter values are unknown and impossible to measure directly during execution, but there is a probability distribution known over the parameter values. We study for the first time computing parameter-independent strategies that are expectation optimal, i.e., optimize the expected reachability probability under the probability distribution over the parameters. We present an encoding of our problem to partially observable MDPs (POMDPs), i.e., a reduction of our problem to computing optimal strategies in POMDPs. We evaluate our method experimentally on several benchmarks: a motivating (repeated) learner model; a series of benchmarks of varying configurations of a robot moving on a grid; and a consensus protocol.Comment: Extended version of a QEST 2018 pape

    Policy learning for time-bounded reachability in Continuous-Time Markov Decision Processes via doubly-stochastic gradient ascent

    Get PDF
    Continuous-time Markov decision processes are an important class of models in a wide range of applications, ranging from cyber-physical systems to synthetic biology. A central problem is how to devise a policy to control the system in order to maximise the probability of satisfying a set of temporal logic specifications. Here we present a novel approach based on statistical model checking and an unbiased estimation of a functional gradient in the space of possible policies. The statistical approach has several advantages over conventional approaches based on uniformisation, as it can also be applied when the model is replaced by a black box, and does not suffer from state-space explosion. The use of a stochastic gradient to guide our search considerably improves the efficiency of learning policies. We demonstrate the method on a proof-of-principle non-linear population model, showing strong performance in a non-trivial task

    Nonzero-sum Stochastic Games

    Get PDF
    This paper treats of stochastic games. We focus on nonzero-sum games and provide a detailed survey of selected recent results. In Section 1, we consider stochastic Markov games. A correlation of strategies of the players, involving ``public signals'', is described, and a correlated equilibrium theorem proved recently by Nowak and Raghavan for discounted stochastic games with general state space is presented. We also report an extension of this result to a class of undiscounted stochastic games, satisfying some uniform ergodicity condition. Stopping games are related to stochastic Markov games. In Section 2, we describe a version of Dynkin's game related to observation of a Markov process with random assignment mechanism of states to the players. Some recent contributions of the second author in this area are reported. The paper also contains a brief overview of the theory of nonzero-sum stochastic games and stopping games which is very far from being complete

    Approximate Linear Programming for Average Cost MDPs

    No full text
    corecore